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Gene Expression Profiles in Normal and Cancer Cells 

This invention was made with support from the National Institutes of 
Health, Grant No. GM07309, CA57345, and CA62924. The U.S. government 
therefore retains certain rights in the invention. 

5 TECHNICAL FIELD OF THE INVENTION 

This invention is related to the diagnosis of cancer, and tools for 
carrying out such diagnosis. 
BACKGROUND OF THE TNVFNTTON 

Much of cancer research over the past 50 years has been devoted to the 

10 analyses of genes that are expressed differently in tumor cells compared to their 

normal counterparts. Although hundreds of studies have pointed out 
differences in the expression of one or a few genes, no comprehensive study of 
gene expression in the cancer cell has been reported. It is therefore not known 
how many genes are expressed differentially in tumor versus normal cells, 

15 whether the bulk of these differences are cell autonomous rather than being 

dependent on the tumor microenvironment, and whether most differences are 
cell-type specific or tumor specific. Thus there is a need in the art for 
information on the molecular changes that occur in cells during cancer 
development and progression. 
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2 ..... 
flTTMMAttY O f T** y INVENTION 

According to one embodiment of the invention, a method is provided 
for diagnosing colon cancer in a sample suspected of being neoplastic. The 

method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 
group consisting of those shown in Table 3; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be lower in the first sample than in the second 
sample. 

According to another embodiment of the invention, another method is 
provided for diagnosing colon cancer in a sample suspected of being neoplastic. 
15 The method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 

20 group consisting of those shown in Table 2; 

identifying the first sample as neoplastic when the level of the 

at least one transcript is found to be higher in the first sample than in the 
second sample. 

In another embodiment of the invention an isolated and purified human 
nucleic acid molecule is provided. The molecule comprises a SAGE tag 
selected from SEQ ID NO: 1-732. 

In yet another aspect of the invention an isolated nucleotide probe is 
provided. The probe comprises at least 12 nucleotides of a human nucleic acid 
molecule, wherein the human nucleic acid molecule comprises a SAGE tag 
30 selected from SEQ ID NO: 1-732. 
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According to another aspect of the invention a method is provided for 
diagnosing pancreatic cancer in a sample suspected of being neoplastic. The 
method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a pancreatic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colon tissue, wherein said transcript is identified by a tag selected from the 
group consisting of those shown Table 4; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

According to still another embodiment of the invention a method of 
diagnosing cancer in a sample suspected of being neoplastic is provided. The 
method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a tissue suspected of 
being neoplastic and the second sample is of a normal human tissue, wherein 
said transcript is identified by a tag selected from the group consisting of those 
shown Table 5; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

According to another embodiment of the invention a method is 
provided to aid in the determination of a prognosis for a colon cancer patient. 
The method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic colonic 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a tag selected from the group consisting of those 
shown in Table 3; 
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4 ...... 

determining a poorer prognosis if the level of the at least one 
transcript is found to be lower in the first sample than in the second sample. 

According to another aspect of the invention a method to aid in 
determining a prognosis for a patient with colon cancer is provided. The 

method comprises the steps of: 

comparing the level of at least one transcript in a first tissue 
sample to a second sample, wherein the first sample is of a colonic cancer 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a tag selected from the group consisting of those 

shown in Table 2; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

In yet another embodiment of the invention a method is provided for 
diagnosing colon cancer in a sample suspected of being neoplastic. Tfie 

method comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
identified by a tag selected from the group consisting of those shown in Table 
3; 

identifying the first sample as neoplastic when the level of 
expression of the protein is found to be lower in the first sample than in the 
second sample. 

In another aspect of the invention a method of diagnosing colon cancer 
in a sample suspected of being neoplastic is provided. The method comprises 
the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
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identified by a tag selected from the group consisting of those shown in Table 

2; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

According to another embodiment of the invention a method is 
provided to aid in determining a prognosis of a patient having pancreatic 
cancer. The method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic 
pancreatic tissue and the second sample is of a normal human colon tissue, 
wherein said transcript is identified by a tag selected from the group consisting 
of those shown Table 4; 

determining a poorer prognosis if transcription is found to be 
higher in the first sample than in the second sample. 

In yet another aspect of the invention a method to aid in providing a 
prognosis for a cancer patient is provided. The method comprises the steps of: 
comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic tissue 
and the second sample is of a normal human tissue of the same tissue type, 
wherein said transcript is identified by a tag selected from the group consisting 
of those shown Table 5; 

determining a poorer prognosis if transcription is found to be 
higher in the first sample than in the second sample. 

According to still another aspect of the invention, a method is provided 
for diagnosing pancreatic cancer in a sample suspected of being neoplastic. 
The method comprises the steps of: 

comparing the level of expression of at least one protein 
encoded by a transcript in a first sample of a tissue to a second sample, wherein 
the first sample is of a pancreatic tissue suspected of being neoplastic and the 
second sample is of a normal human colon tissue, wherein said protein is 
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encoded by a transcript identified by a tag selected from the group consisting 

of those shown Table 4; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

According to yet another aspect of the invention a method is provided 
for diagnosing cancer ma sample suspected of being neoplastic. The method 

comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
tissue suspected of being neoplastic and the second sample is of a normal 
human tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 5; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

In still another embodiment of the invention a method is provided to aid 
in the determination of a prognosis of a colon cancer patient. The method 

comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic colonic tissue and the second sample is of a normal human colonic 
tissue, and wherein the protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown in Table 3; 

detemining a poorer prognosis if the level of expression is 
found to be lower in the first sample than in the second sample. 

In still another embodiment of the invention a method is provided to aid 
in determining a prognosis for a patient with colon cancer. The method 

comprises the steps of: 

comparing the level of expression of at least one protein in a 
first tissue sample to a second sample, wherein the first sample is of a colonic 
cancer tissue and the second sample is of a normal human colonic tissue, and 
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wherein the protein is encoded by a transcript identified by a tag selected from 
the group consisting of those shown in Table 2; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 
5 In still another aspect of the invention a method is provided to aid in 

determining a prognosis of a patient having pancreatic cancer. The method 
comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
10 neoplastic pancreatic tissue and the second sample is of a normal human colon 

tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 4; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 
15 According to even a further aspect of the invention a method is 

provided to aid in providing a prognosis for a cancer patient. The method 
comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
20 neoplastic tissue and the second sample is of a normal human tissue of the same 

tissue type, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 5; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 
25 In still another embodiment of the invention a method of treating a 

cancer cell is provided. The method comprises the step of: 

administering to a cancer cell an antibody which specifically 
binds to a protein encoded by a transcript identified by a tag selected from the 
group consisting of those shown in Tables 2, 4, and 5, wherein the antibody is 
30 linked to a cytotoxic agent. 
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In another aspect of the invention an antibody linked to a cytotoxic 
agent is provided. The antibody specifically binds to a protein encoded by a 
transcript identified by a tag selected from the group consisting of those shown 

in Tables 2, 4, and 5. 

According to another aspect of the invention, a method of detecting 
colon cancer in a patient is provided. The method comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
body sample to a second body sample, wherein the first sample is a body 
sample of the patient and the second sample is of a normal human, wherein the 
protein is encoded by a transcript and the transcript is identified by a tag 
selected from the group consisting of those shown in Table 2, wherein the first 
and second body sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one protein 
or transcript is found to be higher in the first sample than in the second sample. 

In another aspect of the invention a method of detecting pancreatic 
cancer in a patient is provided. The method comprises the steps of: 

comparing the level of at least one protein or transcript encoded 
by a transcript in a first sample of a tissue to a second sample, wherein the first 
sample is of the patient and the second sample is of a normal human, wherein 
said protein is encoded by a transcript and the transcript is identified by a tag 
selected from the group consisting of those shown Table 4, wherein the first 
and second sample is a sample selected from the group consisting of blood, 

urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one protein 
or transcript is found to be higher in the first sample than in the second sample. 
Also provided by the present invention is a method of detecting cancer 

in a patient. The method comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of patient and the 
second sample is of a normal human, wherein said protein is encoded by a 



WO 98/53319 

iy PCT/US98/10277 

9 

transcript and the transcript is identified by a tag selected from the group 
consisting of those shown Table 5, wherein the first and second body sample 
is a sample selected from the group consisting of blood, urine, feces, sputum, 
and serum; 

identifying neoplasia when the level of the at least one protein 
or transcript is found to be higher in the first sample than in the second sample. 

Additionally provided by the present invention is a method to aid in the 
determination of a prognosis for a colon cancer patient The method comprises 
the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of a colon cancer patient 
and the second sample is of a normal human, wherein the protein is encoded 
by a transcript and the transcript is identified by a tag selected from the group 
consisting of those shown in Table 3, wherein the first and second body sampfe 
is a sample selected from the group consisting of blood, urine, feces, sputum, 
and serum; 

determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be lower in the first sample than in the second 
sample. 

Provided by another embodiment of the invention is a method to aid 
in determining a prognosis for a patient with colon cancer. The method 
comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of a colonic cancer 
patient and the second sample is of a normal human, wherein the protein is 
encoded by a transcript and the transcript is identified by a tag selected from 
the group consisting of those shown in Table 2, wherein the first and second 
sample is a sample selected from the group consisting of blood, urine, feces, 
sputum, and serum; 
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determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be higher in the first sample than in the second 
sample. 

According to still another aspect of the invention, a method to aid in 
determining a prognosis of a patient having pancreatic cancer is provided. The 

method comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of a pancreatic cancer 
patient and the second sample is of a normal human, wherein said protein is 
encoded by a transcript and the transcript is identified by a tag selected from 
the group consisting of those shown Table 4, wherein said first and second 
sample is a sample selected from the group consisting of blood, urine, feces, 

sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be higher in the first sample than in the second 
sample. 

Also provided by the present invention is a method to aid in providing 
a prognosis for a cancer patient. The method comprises the steps of: 

comparing the level of expression of at least one protein or 
transcript in a first sample to a second sample, wherein the first sample is of a 
cancer patient and the second sample is of a normal human, wherein said 
protein is encoded by a transcript and the transcript is identified by a tag 
selected from the group consisting of those shown Table 5, wherein the first 
and second sample is a sample selected from the group consisting of blood, 
urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be higher in the first sample than in the second 
sample. 

The present invention further includes antisense oligonucleotides 
complementary in whole or in part to SEQ ID NOS: 1-732. 
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This invention also provides a method for screening for candidate 
agents that modulate the expression of a polymileotide selected from the group 
consisting of the polynucleotides in SEQ ID NOS. 1-732 or their respective 
complements, by contacting a test agent with a pancreatic or colon cell and 
monitoring expression of the polynucleotide, wherein the test agent which 
modifies the expression of the polynucleotide is a candidate agent. 

Hie present invention provides the art with new methods and reagents 
for diagnosing and prognosing cancers. In addition, some of the newly 
disclosed genes may play an important role in the development of cancers. 
BRIEF DESCRIPTION OF THE PRA WTNCS 

Fig. 1. Comparison of expression patterns in colorectal cancers and normal 
colon epithelium. (FIG. 1A) A semi-logarithmic plot reveals 51 tags that 
were decreased more than 10 fold in primary CR cancer cells whereas 32 tags 
were increased more than 10 fold. 62,168 and 60,878 tags derived from 
normal colon epithelium and primary CR cancers, respectively, were used for 
this analysis. The relative expression of each transcript was determined by 
dividing the number of tags observed in tumor and normal tissue as indicated. 
To avoid division by 0, a tag value of 1 was used for any tag that was not 
detectable in one of the samples. These ratios were then rounded to the 
nearest integer and their distribution plotted on the abscissa. The number of 
genes displaying each ratio was plotted on the ordinate. Tu: CR tumors; NC: 
Normal colon. (FIG. IB and FIG. IQ Differentially expressed genes in 
colorectal cancers. The number of transcripts found to be differentially 
expressed (P < 0.01) are presorted as Vain diagrams. Diagrams of transcripts 
that were decreased (FIG. IB) or increased (FIG. 1C) in CR cancers 
compared to normal colon epithelium. Comparisons were between primary 
tumors and cells in culture as indicated. 

Fig. 2. Northern blot analysis of genes differentially expressed in 
gastrointestinal neoplasia. Northern blot analysis was performed on total RNA 
(5 \xg isolated from primary CR carcinomas (T) and matching normal colon 
epithelium (N), or pancreatic carcinomas. The top panel in each case show an 
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example of the ethidium bromide stained gels prior to transfer. The number of 
SAGE tags observed in the original analysis is indicated to the right of each 
blot. (FIG. 2A) Examples of transcripts that were decreased or increased in 
CR cancers. (FIG.2B) Examples of transcripts increased in pancreatic cancers 
(10). (FIG.2Q Examples of transcripts elevated in cancer which were or 
were not cancer type specific. Probes used for Northern blot analysis were as 
follows (Human SAGE Tag unique identifier, gene name, (GenBank accession 
number)): (FIG. 2A) H204104, Guanylin (M95714); H259108, (see Table 2); 
H1000193, (see Table 2); H998030, (see Table 2). (FIG. 2B) H294155, 
RIG-E (U42376); H560056, TIMP-1 (S68252). (FIG. 2C) H802810, 
EST338411 (W52120); H85882, 1-8D (X57351); H618841, GA733-1 
(X13425). 

Tables 2-5. Transcripts Differentially Expressed in Human Cancer. 
Tag sequence represents the NlaD! site plus the adjacent 11 bp SAGE tag. 
Tag number indicates a SAGE UID (unique identifier). NC, TU, CL, PT, PC, 
refers to the number of the indicated tag observed in RNA isolated from 
normal colorectal epithelium, primary colorectal cancers, colorectal cancer cell 
lines, primary pancreatic cancers, or pancreatic cancer cell lines, respectively. 
The Accession and Gene Name refer to representative GenBank entries that 
contain the tag sequence. 

Table 2 Transcripts increased in colorectal cancer . 

Table 3 Transcripts decreased in colorectal cancer. 

Table 4 Transcripts increased in pancreatic cancer. 

Table 5 Transcripts increased in pancreatic and colorectal cancer. 

DETAILED BESCBIETIQH 

The inventors have discovered sets of human genes which are either 
upregulated or downregulated in cancer cells, as compared to normal cells. 
Specifically, certain genes have been found to be upregulated or downregulated 
in colorectal and/or pancreatic cancer cells, when compared to normal colon 
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cells. These sets of differentially regulated genes can be used as diagnostic 
markers, either individually or in sets o$ for example, 2, 5, 10, 20, or 30. 

Genes whose expression was detected to be increased in colorectal 
cancer are shown in Table 2. Genes whose expression was detected to be 
decreased in colorectal cancer are shown in Table 3. Genes whose expression 
was detected as increased in pancreatic cancer are shown in Table 4. Genes 
whose expression was detected as increased in both pancreatic cancer and 
colorectal cancer are shown in Table 5. These latter genes likely play a role in 
neoplastic development generally. 

Tag sequences, as provided herein, uniquely identify genes. This is due 
to their length, and their specific location (3 1 ) in a gene from which they are 
drawn. The full length genes can be identified by matching the tag to a gene 
data base member, or by using the tag sequences as probes to physically isolate 
previously unidentified genes from cDNA libraries. The methods by which 
genes are isolated from libraries using DNA probes are well known in the art. 
See, for example, Veculescu et al., Science 270: 484 (1995), and Sambrook et 
al. (1989), MOLECULAR CLONING: A LABORATORY MANUAL, 2nd 
ed. (Cold Spring Harbor Press, Cold Spring Harbor, New York). Once a gene 
or transcript has been identified, either by matching to a data base entry, or by 
physically hybridizing to a cDNA molecule, the position of the hybridizing or 
matching region in the transcript can be determined. If the tag sequence is not 
in the 3' end, immediately adjacent to the restriction enzyme used to generate 
the SAGE tags, then a spurious match may have been made. Confirmation of 
the identity of a SAGE tag can be made by comparing transcription levels of 
the tag to that of the identified gene in certain cell types. 

In addition to the sequences shown in SEQ ID NOS: 1-732, or their 
complements, this invention also provides the anti-sense polynucleotide stand, 
e.g. antisense RNA to these sequences or their complements. One can obtain 
an antisense RNA using the sequences provided in SEQ ID NOS: 1-732 and 
the methodology described in Vander Krol et al. (1988) BioTechniques 6:958. 
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The invention also encompasses polynucleotides which differ from that 
of the polynucleotides described above, but which produce the same 
phenotypic effect, such as the allele. These altered, but phenotypically 
equivalent polynucleotides are referred to "equivalent nucleic acids." This 
invention also encompasses polynucleotides characterized by changes in 
non-coding regions that do not alter the phenotype of the polypeptide 
produced therefrom when compared to the polynucleotide herein. This 
invention further encompasses polynucleotides, which hybridize to the 
polynucleotides of the subject invention under conditions of moderate or high 
stringency. 

The polynucleotides can be conjugated to a detectable marker, e.g., an 
enzymatic label or a radioisotope for detection of nucleic acid and/or 
expression of the gene in a cell. A wide variety of appropriate detectable 
markers are known in the art, including fluorescent, radioactive, enzymatic or 
other ligands, such as avtdin/biotin, which are capable of giving a detectable 
signal. In preferred embodiments, one will likely desire to employ a fluorescent 
label or an enzyme tag, such as urease, alkaline phosphatase or peroxidase, 
instead of radioactive or other environmental undesirable reagents. In the case 
of enzyme tags, colorimetric indicator substrates are known which can be 
employed to provide a means visible to the human eye or 
spectrophotometrically, to identify specific hybridization with complementary 
nucleic add-containing samples. Briefly, this invention further provides a 
method for detecting a single-stranded polynucleotide identified by SEQ ID 
NOS.1-732 or its complement, by contacting target single-stranded 
polynucleotides with a labeled, single-stranded polynucleotide (a probe) which 
is at least 10 nucleotides of the complement of SEQ ID NOS: 1-732 (or the 
corresponding complement) under conditions permitting hybridization 
(preferably moderately stringent hybridization conditions) of complementary 
single-stranded polynucleotides, or more preferably, under highly stringent 
hybridization conditions. Hybridized polynucleotide pairs are separated from 
un-hybridize<i, single-stranded polynucleotides. The hybridized polynucleotide 
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pairs are deeded using methods well known to those of skill in the art and set 
forth, for example, in Sambrook et al. (1989) supra. 

The polynucleotides of this invention can be isolated using the 
technique described in the experimental section or replicated using PGR. The 
PCR technology is the subject matter of United States Patent Nos.4,683,195, 
4,800,159, 4,754,065, and 4,683,202 and described in PCR: The Polymerase 
Chain Reaction (Mullis et al. eds, Birkhauser Press, Boston (1994)) or 
MacPherson et al. (1991) and (1994), supra, and references cited therein. 
Alternatively, one of skill in the art can use the sequences provided herein and 
a commercial DNA synthesizer to replicate the DNA, Accordingly, this 
invention also provides a process for obtaining the polynucleotides of this 
invention by providing the linear sequence of the polynucleotide, nucleotides, 
appropriate primer molecules, chemicals such as enzymes and instructions for 
their replication and chemically replicating or linking the nucleotides in the 
proper orientation to obtain the polynucleotides. In a separate embodiment, 
these polynucleotides are further isolated. Still further, one of skill in the art 
can insert the polynucleotide into a suitable replication vector and insert the 
vector into a suitable host cell (procaryotic or eucaryotic) for replication and 
amplificatioa The DNA so amplified can be isolated from the cell by methods 
well known to those of skill in the art. A process for obtaining polynucleotides 
by this method is further provided herein as well as the polynucleotides so 
obtained. 

RNA can be obtained by first inserting a DNA polynucleotide into a 
suitable host cell. The DNA can be inserted by any appropriate method, e.g., 
by the use of an appropriate gene delivery vector or by electroporation. When 
the cell replicates and the DNA is transcribed into RNA; the RNA can then be 
isolated using methods well known to those of skill in the art, for example, as 
set forth in Sambrook et al. (1989) supra. For instance, mRNA can be isolated 
using various lytic enzymes or chemical solutions according to the procedures 
set forth in Sambrook et al. (1989), supra or extracted by nucleic-acid-binding 
resins following the accompanying instructions provided by manufactures. 
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Polynucleotides having at least 10 nucleotides and exhibiting sequence 
complementarity or homology to SEQ ID NOS: 1-732 find utility as 
hybridization probes. In some aspects, the fall coding sequence of the 
transcript, Le., for SEQ ID NOS: 1-732, are known. Accordingly, any portion 
5 of the known sequences available in GenBank, or homologous sequences, can 

be used in the methods of this invention. 

It is known in the art that a "perfectly matched" probe is not needed for 
a specific hybridization. Minor changes in probe sequence achieved by 
substitution, deletion or insertion of a small number of bases do not affect the 
10 hybridization specificity. In general, as much as 20% base-pair mismatch 

(when optimally aligned) can be tolerated. Preferably, a probe useful for 
detecting the aforementioned mRNA is at least about 80% identical to the 
homologous region of comparable size contained in the previously identified 
sequences identified by SEQ ID NOS:l r 732, which correspond to previously 
15 characterized genes or SEQ ID NOS:l-732, which correspond to known 

ESTs. More preferably, the probe is 85% identical to the corresponding gene 
sequence after alignment of the homologous region; even more preferably, it 

exhibits 90% identity. 

These probes can be used in radioassays (e.g. Southern and Northern 
20 blot analysis) to detect, prognose, diagnose or monitor various pancreatic or 

colon cells or tissue containing these cells. The probes also can be attached to 
a solid support or an array such as a chip for use in high throughput screening 
assays for the detection of expression of the gene corresponding to one or 
more polynucleotides) of this invention. Accordingly, this invention also 
25 providesatleastoneofthetranscripts identified as SEQ ID NOS: 1-732, orits 

complement, attached to a solid support for use in high throughput screens. 

The total size of fragment, as well as the size of the complementary 
stretches, will depend on the intended use or application of the particular 
nucleic acid segment. Smaller fragments wDl generally find use in hybridization 
30 embodiments, wherein the length of the complementary region may be varied, 
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such as between about 10 and about 100 nucleotides, or even full length 
according to the complementary sequences one wishes to detect. 

Nucleotide probes having complementary sequences over stretches 
greater than 10 nucleotides in length are generally preferred, so as to increase 
stability and selectivity of the hybrid, and thereby improving the specificity of 
particular hybrid molecules obtained. More preferably, one can design 
polynucleotides having gene-complementary stretches of more than 50 
nucleotides in length, or even longer where desired. Such fragments may be 
readily prepared by, for example, directly synthesizing the fragment by 
chemical means, by application of nucleic acid reproduction technology, such 
as the PCR technology with two priming oligonucleotides as described in U.S. 
Pat. No. 4,603,102 or by introducing selected sequences into recombinant 
vectors for recombinant production. A preferred probe is about 50-75 or more 
preferably, 50-100, nucleotides in length. 

The polynucleotides of the present invention can serve as primers for 
the detection of genes or gene transcripts that are expressed in pancreatic or 
colon cells. In this context, amplification means any method employing a 
primer-dependent polymerase capable of replicating a target sequence with 
reasonable fidelity. Amplification may be carried out by natural or recombinant 
DNA-polymerases such as T7 DNA polymerase, Klenow fragment of E.coli 
DNA polymerase, and reverse transcriptase. 

A preferred amplification method is PCR. However, PCR conditions 
used for each reaction are empirically determined. A number of parameters 
influence the success of a reaction. Among them are annealing temperature 
and time, extension time, Mg 2+ ATP concentration, pH, and the relative 
concentration of primers, templates, and deoxyribonucleotides. After 
amplification, the resulting DNA fragments can be detected by agarose gel 
electrophoresis followed by visualization with ethidium bromide staining and 
ultraviolet illumination. 

The invention further provides the isolated polynucleotide operatively 
linked to a promoter of RNA transcription, as well as other regulatory 
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sequences for replication and/or transient or stable expression of the DNA or 
RNA As used herein, the term "operatively linked" means positioned in such 
a manner that the promoter will direct transcription of KNA off the DNA 
molecule. Examples of such promoters are SP6, T4 and T7. In certain 
embodiments, cell-specific promoters are used for cell-specific expression of 
the inserted polynucleotide. Vectors which contain a promoter or a 
promoter/enhancer, with termination codons and selectable marker sequences, 
as well as a cloning site into which an inserted piece of DNA can be operatively 
linked to that promoter are well known in the art and commercially available. 
For general methodology and cloning strategies, see Gene Expression 
Technology (Goeddel ed., Academic Press, Inc. (1991)) and references cited 
therein and Vectors: Essential Data Series (Gacesa and Ramji, eds., John Wiley 
& Sons, NY. (1994)), which contains maps, functional properties, commercial 
suppliers and a reference to GenEMBL accession numbers for various suitable 
vectors. Preferable, these vectors are capable of transcribing KNA in vitro or 
in vivo. 

Fragment of the sequences shown in SEQ ID NOS:l-732 or their 
respective complements also are encompassed by this invention, preferably at 
least lOnucleotides and more preferably having at least 18 nucleotides. Larger 
polynucleotides, e.g., cDNA or genomic DNA, which hybridize under 
moderate or stringent conditions to the polynucleotide sequences shown in 
SEQ ID NOS: 1-732, or their respective complements, also are encompassed 
by this invention. 

In one embodiment, these fragments are polynucleotides that encode 
polypeptides or proteins having diagnostic and therapeutic utilities as described 
herein as well as probes to identify transcripts of the protein which may or may 
not be present. These nucleic acid fragments can by prepared, for example, by 
restriction enzyme digestion of the polynucleotide of SEQ ID NOS.1-732, or 
their complements, and then labeled with a detectable marker. Alternatively, 
random fragments can be generated using nick translation of the molecule. For 
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methodology for the preparation and labeling of such fragments, see Sambrook 
et al., (1989) supra. 

Expression vectors containing these nucleic acids are useful to obtain 
host vector systems to produce proteins and polypeptides. It is implied that 
these expression vectors must be replicable in the host organisms either as 
episomes or as an integral part of the chromosomal DNA. Suitable expression 
vectors include viral vectors, including adenoviruses, adeno-associated viruses, 
retroviruses, cosmids, etc. Adenoviral vectors are particularly useful for 
introducing genes into tissues in vivo because of their high levels of expression 
and efficient transformation of cells both in vitro and in vivo. When a nucleic 
add is inserted into a suitable host cell, e.g., a procaiyotic or a eucaryotic cell 
and the host cell replicates, the protein can be recombinantly produced. 
Suitable host cells will depend on the vector and can include mammalian cells, 
animal cells, human cells, simian cells, insect cells, yeast cells, and bacterial 
cells constructed using well known methods. See Sambrook et al. (1989) 
supra. In addition to the use of viral vector for insertion of exogenous nucleic 
acid into cells, the nucleic acid can be inserted into the host cell by methods 
well known in the art such as transformation for bacterial cells; transfection 
using calcium phosphate precipitation for mammalian cells; or DEAE-dextran; 
electroporation; or microinjection. See Sambrook et al. (1989) supra for this 
methodology. Thus, this invention also provides a host cell, e.g. a mammalian 
cell, an animal cell (rat or mouse), a human cell, or a procaiyotic cell such as 
a bacterial cell, containing a polynucleotide encoding a protein or polypeptide 
or antibody. 

When the vectors are used for gene therapy in vivo or ex vivo, a 
pharmaceutical^ acceptable vector is preferred, such as a 
replication-incompetent retroviral or adenoviral vector. Pharmaceutically 
acceptable vectors containing the nucleic acids of this invention can be further 
modified for transient or stable expression of the inserted polynucleotide. As 
used herein, the term "pharmaceutically acceptable vector" includes, but is not 
limited to, a vector of delivery vehicle having the ability to selectively target 



20 



and introduce the nucleic acid into dividing cells. An example of such a vector 
is a "replication-incompetent" vector defined by its inability to produce viral 
proteins, precluding spread of the vector in the infected host cell. An example 
of a rephcation-incompetent retroviral vector is LNL6 (Miller, A.D. et al. 
(1989) BioTechniques 7:980-990). The methodology of using 
replication-incompetent retroviruses for retroviral-mediated gene transfer of 
gene markers is well established (Correll et al. (1989) PNAS USA 86:8912; 
Bordignon (1989) PNAS USA 86:8912-52; Culver, K. (1991) PNAS USA 
88:3 155; and Rill, DR. (1991) Blood 79(10):2694-700. Clinical investigations 
have shown that there are few or no adverse effects associated with the viral 
vectors, see Anderson (1992) Science 256:808-13. 

Compositions containing the polynucleotides of this invention, in 
isolated form or contained within a vector or host cell are further provided 
herein. When these compositions are to be used pharmaceutical^, they are 
combined with a pharmaceutically acceptable carrier. 

This invention further encompasses genes, either genomic or cDNA, 
which code for a polypeptide or protein in the cell of interest. The genes 
specifically hybridize under moderate or stringent conditions to a 
polynucleotide identified by SEQ ID NOS: 1-732 or their respective 
complements. The process of identification of larger fragment or the 
full-length coding sequence to which the partial sequence depicted in SEQ ID 
NOS:l-732 hybridizes preferably involves the use of the methods and reagents 
provided in this invention, either singularly or in combination. 

Five methods are disclosed herein which allows one of skill in the art 
to isolate the gene or cDNA corresponding to the transcripts of the invention. 



KA^F-PHR Technique 

One method to isolate the gene or cDNA which code for a polypeptide 
or protein and which corresponds to a transcript of this invention, involves the 
S'-RACE-PCR technique. In this technique, the poly-A mRNA that contains 
the coding sequence of particular interest is first identified by hybridization to 
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a sequence disclosed herein and then reverse transcribed with a S'-primer 
comprising the sequence disclosed herein. The newly synthesized cDNA strand 
is then tagged with an anchor primer of a known sequence, which preferably 
contains a convenient cloning restriction site attached at the 5'end. The tagged 
cDNA is then amplified with the 3'-primer (or a nested primer sharing sequence 
homology to the internal sequences of the coding region) and the 5-anchor 
prima-. Hie amplification may be conducted under conditions of various levels 
of stringency to optimize the amplification specificity. 5VRACE-PCR can be 
readily performed using commercial kits (available from, e.g., BRL Life 
Technologies Inc, Clotech) according to the manufacturer's instructions. 

Identification of known genes or ESTs 

In addition, databases exist that reduce the complexity of ESTs by 
assembling contiguous EST sequences into tentative genes. For example, 
TIGR has assembled human ESTs into a datable called THC for tentative 
human consensus sequences. The THC database allows for a more definitive 
assignment compared to ESTs alone. Software programs exist (give examples) 
that allow for assembling ESTs into contiguous sequences from any organism. 

Isolation of cDNAs from a library bv probing with the SAGE transcript or tag 
Alternatively, mRNA from a sample preparation was used to construct 
cDNA library in the ZAP Express vector following the procedure described in 
Velculescu et al. (1997) Science 270:484. The ZAP Express cDNA synthesis 
kit (Stratagene) was used accordingly to the manufacturer's protocol. Plates 
containing 250 to 2000 plaques are hybridized as described in Rupert et al. 
(1988) Mol. Cell. Bio. 8:3104 to oligonucleotide probes with the same 
conditions previously described for standard probes exxcept that the 
hybridization temperature is reduced to room temperature. Washes are 
performed in 6X standard-saline-citrate 0.1% SDS for 30 minutes at room 
temperature. The probes are labeled with 32P-ATP through use of T4 
polynucletoide kinase. 
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H. sapiens partial cDNA sequence; clone 76D 1 2; ver | 


yj2lc05.sl Homo sapiens cDNA clone 149384 3'. 1 


|y v86c02.s 1 Homo sapiens cDN A clone 249602 3' simil j 
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EST87066 Homo sapiens cDNA 5* end similar to None. 


|yj33gl l.sl Homo sapiens cDNA clone 150596 3'. | 


zb 1 7d08.s 1 Homo sapiens cDNA clone 3023 1 9 3\ 


|za92h06.sl Homo sapiens cDNA clone 300059 3V | 


yvOIe06.rl Homo sapiens cDNA clone 241474 5' simil 


yi63g0l,rl Homo sapiens cDNA clone 143952 5* simil 


EST79335 Homo sapiens cDNA similar to None.. 


yo3la05.M Homo sapiens cDNA clone 179504 5'. 


zc32c05.rl Soares senescent fibroblasts NbHSF Homo 


zc48e04.rl Soares senescent fibroblasts NbHSF Homo 
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zl83fl)o\$l Stratagene colon (#937204) Homo sapiens cDNA clone 
511239 3' 


|yj23gl 1 .rl Homo sapiens cDNA clone 149636 5*. | 


zo63d03.sl Stratagene pancreas (#937208) Homo sapiens cDNA clone 
591557 3' 


IEST06454 Homo sapiens cDNA clone HIBBG3 1 3' end. 


zm21a!2.sl Stratagene pancreas (#937208) Homo sapiens cDNA clone 
526270 3' 


|H. Sapiens mRNA for cytokeratin 20. 


Human profilin mRNA, complete cds. 


Human smooth muscle myosin alkali light chain mRNA ( 


(Human M4-50 mRNA for HLA class 1 antigen. | 


H.sapiens mitochondrial EST sequence (00 1 T24) from ! 


zl74e07.sl Stratagene colon (#937204) Homo sapiens cDNA clone 
510372 3' similar to contains Alu repetitive element 


HUMGS04077 Human colon 3'directed Mbol cDNA, HUMGS04077, 
clone cm 1210 


H.sapiens CpG DNA, clone 140c4, reverse read cpgi4(Mitochondria 
EST 


Human guanylin mRNA, complete cds. 


Unknown 


ynOIbOl.rl Homo sapiens cDNA clone 167113 5' similar to SP:ZK783.I 
CE00760 ;. 


EST277 Homo sapiens cDN A clone 1 0H4. 


H^apiens mRNA for non-muscle type cofiiin. 
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H.sapiens mitochondrial EST sequence (009T28) from [ 


zal6a03.sl Homo sapiens cDNA clone 292684 3' similar to contains Alu 
repetitive element;contains element LI repetitive element 


ze30b!0.sl Soares retina N264HR Homo sapiens cDNA clone 
360475 3' simllarto contains Alu repetitive element 


yll4hOLsl Homo sapiens cDNA clone 158257 3' similar to contains Alu 
repetitive element;contains TARI repetitive element ;. 


zr79h Soares NhHMPu S 1 Homo sapiens cDNA clone 68 1 957 3' 
similar to WP:C33A12.7 CE05353 
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ESTI2940 Uterus tumor 1 Homo sapiens cDNA 3* end 


za52d02.rl Soares fetal liver spleen INFLS Homo sapiens cDNA clone 
296163 5'. 


|yx44cl l.s) Homo sapiens cDNA clone 264596 3*. ! 


yz!3cl2.sl Homo sapiens cDNA clone 282934 3'. 


zb38cl Lsl Soares parathyroid tumor NbHPA Homo sapiens cDNA 
clone 305876 3'. 


Human wild-type p53 activated fragment-1 (WAFI) mR 


zcl IfOLsl Soares parathyroid tumor NbHPA Homo sapiens cDNA 
clone 322009 3' 


gb|WI5332|W15332 zc!6dI0.sl Soares parathyroid tumor NbHPA 
Homo sapiens cDNA clone 322483 3* 


zc04g!0.sl Soares parathyroid tumor NbHPA Homo sapiens cDNA 
clone 321378 3' 


yw82cOLsl Homo sapiens cDNA clone 258720 3'. 


Human sodium/potassium-transporting ATPase beta-3 


Unknown 1 


zp44fl Lsl Stratagene muscle 937209 Homo sapiens cDNA clone 
612333 3' similar to contains Alu repetitive element; 


yh87e04.sl Homo sapiens cDNA clone 136734 3' similar to contains Alu 
repetitive element;. 


yh87e04.sl Homo sapiens cDNA clone 136734 3* similar to contains Alu 
repetitive element;. 


zq06e03.sl Stratagene muscle 937209 Homo sapiens cDNA clone 
628924 3' similar to contains Alu repetitive element 


hbc760 Homo sapiens cDNA clone hbc760 3'end similar to nonspacific 
crossreacting antigen. 


zl67e0Lsl Stratagene colon (#937204) Homo sapiens cDNA clone 
509688 3* similar to TR:G 1 89087 


similar to none j 


zo31e02.sl Stratagene colon (#937204) Homo sapiens cDNA clone 
588506 3' 


zp45b09.sl Stratagene HeLa cell s3 937216 Homo sapiens cDNA clone 
612377 3' 
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CATGGCAGCTCCTGT 








CATGTGTCCTGGTTC 


CATGACAAACCCCCA 








CATGTAGGATGGGGG 


CATGACTGTCGCGGC j 


CATGGTAGCAGGTGT 








ICATGAATCACAAATA 






CATGAGGATGGTCCC 
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zkl0c!2.sl Soares pregnant uterus NbHPU Homo sapiens cDNA clone 
470158 3' 


H.sapiens granulin mRNA, complete cds. 


gb|U53204|HSU53204 Human plectin (PLECI) mRNA, complete cds. 


yc22a06.s1 Homo sapiens cDNA clone 81 394 3\ 


gb|U67963|HSU67963 Human lysophospholipase homolog (HU-K5) 
mRNA 


yh39a!2.rl Homo sapiens cDNA clone 1 32094 5' similar to gb:D26 1 29 
RIBONUCLEASE PANCREATIC PRECURSOR (HUMAN) 


yj83c08.sl Homo sapiens cDNA clone 155342 3' similar to gb:D26 129 
RIBONUCLEASE PANCREATIC PRECURSOR (HUMAN);. 


yi84h0l.s! Homo sapiens cDNA clone 145969 3' similar to gb:D26 129 
RIBONUCLEASE PANCREATIC PRECURSOR (HUMAN);. 


yj56c03.sl Homo sapiens cDNA clone 152740 3* similar togb:D26!29 
RIBONUCLEASE PANCREATIC PRECURSOR (HUMAN);. 


zv35h!2.rl Soares ovary tumor NbHOT Homo sapiens cDNA clone 
755687 5' similar to TR:G459890 G459890 OVEREXPRESSED IN 
TESTICULAR TUMORS 


yj40cl t.rl Homo sapiens cDNA clone 15 1220 5'. 


zol2g08.rl Stratagene colon (#937204) Homo sapiens cDNA clone 
586718 5' similar to TR:G459890 G459890 OVEREXPRESSED IN 
TESTICULAR TUMORS. 




zd33c!0.sl Soares fetal heart NbHH 19 W Homo sapiens cDNA clone 
342450 3* similar to contains Alu repetitive element 


yp90a02.sl Homo sapiens cDNA clone 194666 3' similar to contains Alu 
repetitive element; 


zk69e08.sl Soares pregnant uterus NbHPU Homo sapiens cDNA clone 
488102 3' similar to contains element MER6 repetitive element 


Human mRNA for metallothionein from cadmium-treated cells j 


yp2ld05.rl Homo sapiens cDNA clone 188073 5* similar to gb:J0502l 
EZRIN 


emb| Y096 1 6|HSICE H.sapiens mRNA for putative carboxy lesterase 


Human messenger RNA for beta-globin. j 
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Human guanine nucleotide-binding regulatory protein j 


Unknown j 


yv72h06.sl Soares fetal liver spleen INFLS Homo sapiens 
cDNA clone 2483 15 3' similar to contains element PTR7 repetitive 
element 


Unknown 


Unknown 


Human carcinoembryonic antigen mRNA (CEA), complete cds. 


HUMGS04154 Human colon 3 # directcd Mbol cDNA, HUMGS04I54, 
clone cm02 15. 


yc36e02.rl Homo sapiens cDNA clone 82778 5 similar to gb:L07765 
LIVER CARBOXYLESTERASE PRECURSOR 


Unknown 
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zrl9bl Lsl StratageneNT2 neuronal precursor 937230 Homo sapiens 
cDNA clone 663837 3' 


zq97h01.sl Stratagene NT2 neuronal precursor 937230 Homo sapiens 
cDNA clone 649969 3* 


yp57fl0.rl Homo sapiens cDNA clone 191563 5' similar to gb:M90657 
TUMOR-ASSOCIATED ANTIGEN L6 (HUMAN);. 


|Unknown _ | 
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H743610 


H 1043445 


1 CATGTTTCTCGTCGC 1 


CATGTCAGAGCCCTG 


CATGTTCCGCGTTCC 


CATGTACGGTGTGGG 


CATGCTCAGAACTTG 


CATGGGACTAAATGA 


CATGGCTTGGGGATT 




CATGACCCAACTGCC 


CATGCTGAACCTCCC j 


CATGCAAGAGTTTCT 




CATGGTCCGACTGCA 


CATGTTTGGTTTCAC 
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iHuman fetal brain cDNA 3'*end OEN-007C04. 


|zb91hl l.sl Soares parathyroid tumor NbHPA Homo sap 


H.sapiens mitochondrial EST sequence (132-20) from skeletal 
muscle 


EST186995 HCC cell line (matastasts to liver in mouse) II Homo 
sapiens cDNA 5* end 


H. sapiens partial cDNA sequence; clone A6A03; ver 


yw53h01.sl Homo sapiens cDNA clone 255985 3\ 


Human MHC class I HLA-A2 gene, complete cds. 


yf25f!2.sl Homo sapiens cDNA clone 127919 3*. 


yl22c!0.sl Homo sapiens cDNA clone 158994 3\ 


EST5837! Homo sapiens cDNA 3* end similar to None.. 


H.sapiens mitochondrial EST sequence (129-09) [ 


zt54fI0.sl Soares ovary tumor NbHOT Homo sapiens cDNA clone 
726187 3' 


zt31cl l.rl Soares ovary tumor NbHOT Homo sapiens cDNA clone 
723956 5* similar to TR.G205858 G205858 RAT ORF 


zb62d07.sl Soares fetal lung NbHL19W Homo sapiens cDNA clone 
308173 3* similar to PIR:A39484 A39484 androgcn-withdrawal 
apoptosis protein RVPl, prostatic * rat 


zbl9c06.sl Homo sapiens cDNA clone 302506 3' similar to 
PIR:A39484 A39484 androgen-withdrawal apoptosis protein RVPl, 
prostatic - rat ; 


zk39d06.sl Soares pregnant uterus NbHPU Homo sapiens cDNA 
clone 485195 3' similar to PIR:A39484 A39484 androgen- 
withdrawal apoptosis protein RVPl 


Human partial cDNA sequence with CCA repeat region 


Human episialin variant A mRNA, 3' end. j 


Unknown \ 


|seq816 Homo sapiens cDNA clone b4HB3MA-COT8-HAP-Ft 


1 

i 

E 

s 
& 

ac 


Homo sapiens huntingtin (HD) gene, exon 66. 


dbj|C00470|C00470 HUMGS0007620, Human Gene Signature, 3'- 
direefcd cDNA sequence. 


yy62g08.sl Homo sapiens cDNA clone 278 1 74 3\ 1 
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iCATGGCTAGGTTTAT 


CATGCGCTTTAGGGA 


CATGGGGGTCAGGG 


CATGATTTTCTAAAA 


CATGCACTTGCCCT 1 


CATGCCTGCTGCAGG 


CATGAGAACCTTCCA 


CATGCTCTGCCCTC 






CATGGCCATCCCCTT 


CATGGCCCAGCGGCC 


CATGTGGCGCGTGTC 








CATGAGGGTGTTTTC 1 


CATGCCTGGGAAGTG f 


CATGAGTCTGCTGGA | 
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CATGAAAAGAGTGGT 1 


CATGGCCACGTGGAG 1 


CATGAGGATGTGGG 
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1 Human kallikrein mRNA, complete cds, clone clone p 


ym45dl0.sl Homo sapiens cDNA clone 51262 3'. 


zk0leI0.sl Soares pregnant uterus NbHPU Homo sapiens cDNA 
clone 469290 3' 


zul2c!2.rl Soares testis NHT Homo sapiens cDNA clone 73 1638 5' 
similar to gb:M61900 Human prostaglandin D synthase gene, 
complete cds. (HUMAN); 


gb|U66894|HSU66894 Human epithelium -restricted Ets protein ESX 
mRNA, 


Human epithelial-specific transcription factor ESE-lb (ESE-1) 
mRNA, complete cds 


Human colon 3'directed Mbol cDNA, HUMGS06772 


Unknown 


ze88g07.sl Soares fetal heart NbHHI9W Homo sapiens cDNA clone 
366108 3' 


za90hl0.sl Soares fetal lung NbHL19W Homo sapiens cDNA clone 
299875 3\ 


zn52h06.sl Stratagene muscle 937209 Homo sapiens cDNA clone 
561851 3' 


Human HepG2 3 -directed Mbol cDNA, clone a-35. 


IB2474 Homo sapiens cDNA 3'end. [ 


yc82e0 1 .rl Homo sapiens cDNA clone 22306 Sr. 1 


za61h02.sl Homo sapiens cDNA clone 297075 3'. 


zh75fl)8.sl Soares fetal liver spleen INFLS SI Homo sapiens cDNA 
done4l7927 3 1 


H. sapiens partial cDNA sequence; clone c-29h08. 


Human 1 1 beta-hydroxysteroid dehydrogenase type I! | 


ya31a06.s5 Homo sapiens cDNA clone 62194 3 contains Alu 
repetitive element,. 


Unknown 


Unknown I 


Unknown 


Unknown 1 
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Isolation Of Partial CDNA (V frapmenrt hv 3' directed P C R reaction 

This procedure is a modification of the protocol described in Polyak et 
al. (1997) Nature 389:300. Briefly, the procedure uses SAGE tags in PCR 
reaction such that the resultant PCR product contains the SAGE tag of interest 
as well as additional cDNA, the length of which is defined by the position of 
the tag with respect to the 3' end of the cDNA. The cDNA product derived 
from such a transcript driven PCR reaction can be used for many applications. 

RNA from a source believed to express the cDNA corresponding to a 
given tag is first converted to double-stranded cDNA using any standard 
cDNA protocol. Similar conditions used to generate cDNA for SAGE library 
construction can be employed except that a modified oligo-dT primer is used 
to dreive the first strand synthesis. For example, the oligonucleotide of 
compositon S'-B-TCC GGC GCG CCG TTT T CC CAG TCA CGA(30)-3', 
contains a poly-T stretch at the 3' end for hybridization and priming from 
poly-A tails, an Ml 3 priming site for use in subsequent PCR steps, a 5' Biotin 
label (B) for capture to strepavidin-coated magnetic beads, and an AscI 
restriction endonuclease site for releasing the cDNA from the 
streptavidin-coated magnetic beads. Theoretically, any sufficiently-sized DNA 
region capable of hybridizing to a PCR primer can be used as well as any other 
8 base pair recognizing endonuclease. 

cDNA constructed utilizing this or similar modified oligo-dT primer is 
thai processed exactly as described in U.S. Patent No. (insert) up until adapter 
ligation where only one adapter is ligated to the cDNA pool. After adapter 
ligation, the cDNA is released from the streptavidin-coated magnetic beads and 
is then used as a template for cDNA amplification. 

Various PCR protocols can be employed using PCR priming sites 
within the 3' modified oligo-dT primer and the SAGE tag. The SAGE 
tag-derived PCR primer employed can be of varying length dictated by 5 1 
extension of the tag into the adaptor sequence. cDNA products are now 
available for a variety of applications. 
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This technique can be further modified by: (1) altering the length and/or 
content of the modified oligo-dT primer; (2) ligating adaptors other than that 
previously employed within the SAGE protocol; (3) performing PCR from 
template retained on the streptavidin-coated magnetic beads; and (4) priming 
first strand cDNA synthesis with non-oligo-dT based primers. 

Tf ^frtinn of cDNA using fignftTrap p Ar nr modified GeneTraPPer Technology 

The reagents and manufacturer's instructions for this technology are 
commercially available from Life Technologies, Inc., Gahhersburg, Maryland. 
Briefly, a complex population of single-stranded phagemid DNA containing 
directional cDNA inserts is enriched for the target sequence by hybridization 
in solution to a biotinylated oligonucleotide probe complementary to the target 
sequence. The hybrids are captured on streptavidin-coated paramagnetic 
beads. A magnet retrieves the paramagnetic beads from the solution, leaving 
nonhybridized single-stranded DNAs behind. Subsequently, the captured 
single-stranded DNA target is released from the biotinylated oligonucleotide. 
After release, the cDNA clone is further enriched by using a nonbiotinylated 
target oligonucleotide to specifically prime conversion of the single-stranded 
target to double-stranded DNA Following transformation and plating, 
typically 20% to 100% of the colonies represent the cDNA clone of interest. 
To identify the desired cDNA clone, the colonies may be screened by colony 
hybridization using the 32P-labeled oligonucleotide as described above for 
solution hybridization, or alternatively by DNA sequencing and alignment of 
all sequences obtained from numerous clones to determine a consensus 
sequence. 

The genes which are identified herein as being differentially expressed 
in normal and cancer cells can be used diagnostically and prognostically. 
Transcription levels in a test sample suspected of being neoplastic can be 
determined and compared to the levels in normal colon cells. The test sample 
may be from any tissue suspected of neoplasia, and particularly from either 
suspected colorectal or suspected pancreatic cancer cells. The control cells for 
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the purposes of comparison are normal cells, preferably of the same tissue type 
as the test sample, e.g., colon cells, or pancreatic duct epithelial cells. 
Upregulation of transcription or downregulation of transcription is therefore 
diagnostic of the neoplastic state, depending on what gene is used as a test 
reagent. Similarly, transcription levels can be monitored to assess patent 
responses to anti-tumor therapies. Transcription levels will also provide 
prognostic information. For example, the level of transcription in a test sample 
can be compared to levels found in bona fide normal and tumor cells. More 
extreme deviations from normal expression levels indicate a poorer prognosis. 

Transcription levels can be determined according to any means known 
in the art. These include, without limitation, Northern blots, nuclear run-on 
assays, in vitro transcription assays, primer extension assays, quantitative 
reverse transcriptase-polymerase chain reactions (RT-PCR), and hybrid filter 
binding assays. These techniques are well known in the art. See J.C. Alwine, 
D.J. Kemp, G.R. Stark, Proc. Natl Acad. Sci. U.S.A. 74, 5350 (1977); K. 
Zinn, D. Di-Maio, T. Maniatis, Cell 34, 865 (1983); G. Veres, R.A. Gibbbs, 
S.E. Scherer, C.T. Caskey, Science 237, 415 (1987). 

Similarly, upregulated genes and downregulated genes can be detected 
by measuring expression of their protein products. This can be done by any 
means known in the art, including but not limited to Western (immuno) blot, 
enzyme linked immunoadsorbent assay, radioimmunoassay, and enzyme assay. 
Such techniques are well known in the art. Protein products can be detected 
in tissue samples of a test patient, using a suspect sample as a test sample, and 
a matched normal tissue sample from the same tissue type as a control. If 
normal tissue is not available then a closely related tissue type can be used. 
Desirably both the samples being compared will be from the same individual. 
Alternatively, aberrant expression levels of protein products can be detected in 
body samples, such as blood, serum, feces, urine, sputum. As a control, a 
normal matched sample can be used from a healthy individual. Aberrant 
expression levels of transcripts can also be detected in such body samples, 
particularly in blood Aid serum. 
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Probes for use in the assays for transcription levels of particular genes 
or sets of genes may be KNA or DNA. The probes will be isolated 
substantially free of other cellular RNAs or DNAs. If the reagent contains one 
probe then it will comprise at least 50% of the nucleic acids in the reagent 
composition. If the reagent contains more than one probe, then the proportion 
will decrease accordingly, so that specific probes will still comprise at least 
50% of the nucleic acids in the reagent composition. 

Probes can be labeled according to any means known in the art. These 
may include radioactive labels, fluorescent labels, enzymatic labels, and binding 
partner labels such as biotin. Means for labeling and detecting probes are well 
known in the art. Probes comprise at least 10, 1 1, 12, 15, 20, or 30 contiguous 
nucleotides of a selected gene. 

This invention provides proteins or polypeptides expressed from the 
polynucleotides of this invention, which is intended to include wild-type* and 
recombinantly produced polypeptides and proteins from procaryotic and 
eucaryotic host cells, as well as muteins, analogs and fragments thereof. In 
some embodiments, the term also includes antibodies and anti-idiotypic 
antibodies. 

It is understood that functional equivalents or variants of the wild-type 
polypeptide or protein also are within the scope of this invention, for example, 
those having conservative amino acid substitutions. Other analogs include 
fusion proteins comprising a protein or polypeptide. 

The proteins and polypeptides of this invention are obtainable by a 
number of processes well known to those of skill in the art, which include 
purification, chemical synthesis and recombinant methods. Full length proteins 
can be purified from a colon or pancreatic cell or tissue lysate by methods such 
as immunoprecipitation with antibody, and standard techniques such as gel 
filtration, ion-exchange, reversed-phase, and affinity chromatography using a 
fusion protein as shown herein. For such methodology, see for example 
Deutscher et al. (1999) Guide To Protein Purification: Methods In 
Enzymology (Vol. 182, Academic Press). Accordingly, this invention also 
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provides the processes for obtaining these proteins and polypeptides as well as 
the products obtainable and obtained by these processes. 

The proteins and polypeptides also can be obtained by chemical 
synthesis using a commercially available automated peptide synthesizer such 
as those manufactured by Perkin Elmer/Applied Biosystems, Inc., Model 430A 
or 431 A, Foster Chy. The synthesized protein or polypeptide can be 
precipitated and further purified, for example by high performance liquid 
chromatography (HPLC). Accordingly, this invention also provides a process 
for chemically synthesizing the proteins of this invention by providing the 
sequence of the protein and reagents, such as amino acids and enzymes and 
linking together the amino acids in the proper orientation and linear sequence. 

Alternatively, the proteins and polypeptides can be obtained by 
well-known recombinant methods as described, for example, in Sambrook et 
al, (1989), supra, using the host cell and vector systems described above. 

Also provided by this application are the polypeptides and proteins 
described herein conjugated to a detectable agent for use in the diagnostic 
methods. For example, detectably labeled proteins and polypeptides can be 
bound to a column and used for the detection and purification of antibodies. 
They also are useful as immunogens for the production of antibodies as 
described below. The proteins and fragments of this invention are useful in an 
in vitro assay system to screen for agents or drugs, which modulate cellular 
processes. 

The proteins of this invention also can be combined with various liquid 
phase carriers, such as sterile or aqueous solutions, pharmaceutical^ 
acceptable carriers, suspensions and emulsions. Examples of non-aqueous 
solvents include propyl ethylene glycol, polyethylene glycol and vegetable oils. 
When used to prepare antibodies, the carriers also can include an adjuvant that 
is useful to non-specifically augment a specific immune response. A skilled 
artisan can easily determine whether an adjuvant is required and select one. 
However, for the purpose of illustration only, suitable adjuvants include, but 
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are not limited to Freund's Complete and Incomplete, mineral salts and 
polynucleotides. 

This invention also provides a pharmaceutical composition comprising 
any of a protein, analog, mutein, polypeptide fragment, antibody, antibody 
fragment or anti-idiotipic antibody of this invention, alone or in combination 
with each other or other agents, and an acceptable carrier. These compositions 
are useful for various diagnostic and therapeutic methods. 

Antibodies can be generated using the proteins encoded by the 
transcripts identified by the tags disclosed herein. Use of all or portions of the 
protein as immunogens is routine in the art. Similarly, fusion proteins can be 
used as immunogens. Antibodies can be affinity purified using the proteins or 
portions thereof used as immunogens. Similarly, monoclonal antibodies 
specifically immunoreactive with the protein sequences of the invention can be 
generated according to techniques which are well known in the art. 

Antibodies can be used analytically to quantitate the expression of 
particular transcripts identified herein as upregulated or downregulated in 
cancer. In addition, antibodies can be conjugated or non-covalently linked to 
cytotoxic agents, such as cytotoxins, radionuclides, chemotherapeutic drugs, 
etc. Such antibodies can be used therapeutically to specifically target cancer 
cells in which the protein antigens are upregulated. These include the proteins 
encoded by the transcripts identified by the tags shown in Tables 2, 4, and 5. 
Means of making such linked cytotoxic antibodies and of administering the 
same are well known in the art. 

Also provided by this invention is an antibody capable of specifically 
forming a complex with the proteins or polypeptides as described above. The 
term "antibody" includes polyclonal antibodies and monoclonal antibodies. 
The antibodies include, but are not limited to mouse, rat, and rabbit or human 
antibodies. 

Laboratory methods for producing polyclonal antibodies and 
monoclonal antibodies, as well as deducing their corresponding nucleic acid 
sequences, are known in the art, see Harlow and Lane (1988) supra and 
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Sambrook et al. (1989) supra. The monoclonal antibodies of this invention can 
be biologically produced by introducing protein or a fragment thereof into an 
animal, e.g., a mouse or a rabbit. The antibody producing cells in the animal 
are isolated and fused with myeloma cells or heteromyeloma cells to produce 
hybrid cells or hybridomas. Accordingly, the hybridoma cells producing the 
monoclonal antibodies of this invention also are provided. 

Thus, using the protein or fragment thereof and well known methods, 
one of skill in the art can produce and screen the hybridoma cells and 
antibodies of this invention for antibodies having the ability to bind the proteins 
or polypeptides. 

If a monoclonal antibody being tested binds with the protein or 
polypeptide, then the antibody being tested and the antibodies provided by the 
hybridomas of this invention are equivalent. It also is possible to determine 
without undue experimentation, whether an antibody has the same specificity 
as the monoclonal antibody of this invention by determining whether the 
antibody being tested prevents a monoclonal antibody of this invention from 
binding the protein or polypeptide with which the monoclonal antibody is 
normally reactive. If the antibody being tested competes with the monoclonal 
antibody of the invention as shown by a decrease in binding by the monoclonal 
antibody of this invention, then it is likely that the two antibodies bind to the 
same or a closely related epitope. Alternatively, one can pre-incubate the 
monoclonal antibody of this invention with a protein with which it is normally 
reactive, and determine if the monoclonal antibody being tested is inhibited in 
its ability to bind the antigen. If the monoclonal antibody being tested is 
inhibited then, in all likelihood, it has the same, or a closely related, epitopic 
specificity as the monoclonal antibody of this invention. 

The term "antibody" also is intended to include antibodies of all 
isotypes. Particular isotypes of a monoclonal antibody can be prepared either 
directly by selecting from the initial fusion, or prepared secondarily, from a 
parental hybridoma secreting a monoclonal antibody of different isotype by 
using the sib selection technique to isolate class switch variants using the 
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procedure described in Steplewski et al. (1985) Proc. Natl. Acad. Sci. 82:8653 
or Spira et al. (1984) J. Immunol. Methods 74:307. 

This invention also provides biological active fragments of the 
polyclonal and monoclonal antibodies described above. These "antibody 
fragments" retain some ability to selectively bind with its antigen or 
immunogen. Such antibody fragments can include, but are not limited to. 

(1) Fab, 

(2) Fab', 

(3) F(ab02, 

(4) Fv, and 

(5) SCA 

A specific example of "a biologically active antibody fragment" is a 
CDR region of the antibody. Methods of making these fragments are known 
in the art, see for example, Harlow and Lane, (1988) supra. 

The antibodies of this invention also can be modified to create chimeric 
antibodies and humanized antibodies (Oi, et al. (1986) BioTechniques 
4(3):214). Chimeric antibodies are those in which the various domains of the 
antibodies' heavy and light chains are coded for by DNA from more than one 
species. 

The isolation of other hybridomas secreting monoclonal antibodies with 
the specificity of the monoclonal antibodies of the invention can also be 
accomplished by one of ordinary skill in the art by producing anti-idiotypic 
antibodies (Herlyn, et al. (1986) Science 232:100). An anti-idiotypic antibody 
is an antibody which recognizes unique determinants present on the 
monoclonal antibody produced by the hybridoma of interest. 

Idiotypic identity between monoclonal antibodies of two hybridomas 
demonstrates that the two monoclonal antibodies are the same with respect to 
their recognition of the same epitopic determinant. Thus, by using antibodies 
to the epitopic determinants on a monoclonal antibody it is possible to identify 
other hybridomas expressing monoclonal antibodies of the same epitopic 
specificity. 
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It is also possible to use the antiidiotype technology to produce 
monoclonal antibodies which mimic an epitope. For example, an anti-idiotypic 
monoclonal antibody made to a first monoclonal antibody will have a binding 
domain in the hypervariable region which is the mirror image of the epitope 
bound by the first monoclonal antibody. Thus, in this instance, the 
anti-idiotypic monoclonal antibody could be used for immunization for 
production of these antibodies. 

As used in this invention, the term "epitope" is meant to include any 
determinant having specific affinity for the monoclonal antibodies of the 
invention. Epitopic determinants usually consist of chemically active surface 
groupings of molecules such as amino acids or sugar side chains and usually 
have specific three dimensional structural characteristics, as well as specific 
charge characteristics. 

Hie antibodies of this invention can be linked to a detectable agent or 
label. There are many different labels and methods of labeling known to those 
of ordinary skill in the art. 

The antibody-label complex is useful to detect the protein or fragments 
in a sample, using standard immunochemical techniques such as 
immunohistochemistry as described by Harlow and Lane (1988) supra. 
Competitive and non-competitive immunoassays in either a direct or indirect 
format are examples of such assays, e.g., enzyme linked immunoassay (ELISA) 
radioimmunoassay (RIA) and the sandwich (immunometric) assay. Those of 
skill in the art will know, or can readily discern, other immunoassay formats 
without undue experimentation. 

The coupling of antibodies to low molecular weight haptens can 
increase the sensitivity of the assay. The haptens can then be specifically 
detected by means of a second reaction. For example, it is common to use 
haptens such as biotin, which reacts avidin, or dinitropherryl, pyridoxal, and 
fluorescein, which can react with specific anti-hapten antibodies. See Harlow 
and Lane (1988) supra. 
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The monoclonal antibodies of the invention also can be bound to many 
different carriers. Thus, this invention also provides compositions containing 
the antibodies and another substance, active or inert. Examples of well-known 
carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, 
5 amylases, natural and modified celluloses, polyacrylamides, agaroses and 

magnetite. The nature of the carrier can be either soluble or insoluble for 
purposes of the invention. Those skilled in the art will know of other suitable 
carriers for binding monoclonal antibodies, or will be able to ascertain such, 
using routine experimentation. 
10 Compositions containing the antibodies, fragments thereof or cell lines 

which produce the antibodies, are encompassed by this invention. When these 
compositions are to be used pharmaceutically, they are combined with a 
pharmaceutically acceptable carrier. 

The present invention also provides a screen for various agents wfiich 
15 modulate the expression of a gene in a pancreatic or colon cell. To practice the 

method in vitro, suitable cell cultures or tissue cultures are first provided. The 
cell can be a cultured cell or a genetically modified cell in which a trancript 
from SEQ ID NOS: 1-732, or their complements, is expressed. Alternatively, 
the cells can be from a tissue biopsy. The cells are cultured under conditions 
20 (temperature, growth or culture medium and gas (COj)) and for an appropriate 

amount of time to attain exponential proliferation without density dependent 
constraints. It also is desirable to maintain an additional separate cell culture; 
one which does not receive the agent being tested as a control. 

As is apparent to one of skill in the art, suitable cells may be cultured 
25 in microtiter plates and several agents may be assayed at the same time by 

noting genotypic changes, phenotypic changes or cell death. 

When the agent is a composition other than a DNA or KNA, the agent 
may be directly added to the cell culture or added to culture medium for 
additioa As is apparent to those skilled in the art, an "effective" amount must 
30 be added which can be empirically determined. When the agent is a 

polynucleotide, it may be directly added by use of a gene gun or 
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electroporation. Alternatively, it may be inserted into the cell using a gene 
delivery vehicle or vector as described above. 

An agent is a potential therapeutic if it alters the expression of gene in 
the cell. Altered expression can be detected by assaying for altered mRNA 
expression or protein expression using the probes, primers and antibodies as 
described herein. 

For the purposes of this invention, an "agent" is intended to include, but 
not be limited to a biological or chemical compound such as a simple or 
complex organic or inorganic molecule, a peptide, a protein (e.g. antibody) or 
a polynucleotide (e.g. anti-sense). A vast array of compounds can be 
synthesized, for example polymers, such as polypeptides and polynucleotides, 
and synthetic organic compounds based on various core structures, and these 
are also included in the term "agent". In addition, various natural sources can 
provide compounds for screening, such as plant or animal extracts, and the 
like. It should be understood, although not always explicitly stated that the 
agent is used alone or in combination with another agent, having the same or 
different biological activity as the agents identified by the inventive screen. The 
agents and methods also are intended to be combined with other therapies. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
examples which are provided herein for purposes of illustration only, and are 
not intended to limit the scope of the invention. 

EXAMPT.K 1 

This example demonstrates the characterization of the general 
transcription of human colorectal epithelium, colorectal cancers, and pancreatic 
cancers. 

We used the recently developed SAGE (serial analysis of gene 
expression) method to identify and quantify a total of 303,706 transcripts 
derived from human colorectal (CR) epithelium, CR cancers or pancreatic 
cancers (Table 1 A ) (3). These transcripts represented approximately 48,741 
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different genes (4) that ranged in average expression from 1 copy per cell to as 
many as 5,300 copies per cell (5). The number of different transcripts observed 
in each cell population varied from 14,247 to 20,471 . The bulk of the mRNA 
mass (75%) consisted of transcripts expressed at more than five copies per cell 
on average (Table IB). In contrast, the majority (86%) of transcripts were 
expressed at less than 5 copies per cell, but in aggregate this low abundance 
class represented only 25% of the mRNA mass. This distribution was 
consistently observed among the different samples analyzed and was consistent 
with previous studies of RNA abundance classes based on RNA-DNA 
reassociation kinetics (Rot curves) . Monte Carlo simulations revealed that our 
analyses had a 92% probability of detecting a transcript expressed at an 
average of three copies per cell (7). 
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Many of the SAGE tags appeared to represent previously undescribed 
transcripts, as only 54% of the tags matched entries in GenBank (Table 1). 
Twenty percent of these matching transcripts corresponded to characterized 
mRNA sequence entries in GenBank, whereas 80% matched uncharacterized 
EST entries. As expected, the likelihood of a tag being present in the 
databases was related to abundance; GenBank matches were identified for 98% 
of the transcripts expressed at more than 500 copies per cell but for only 51% 
of the transcripts expressed at < 5 copies per cell. Because the SAGE data 
provide a quantitative assay of transcript abundance, unaffected by differences 
in cloning or PCR efficiency, these data provide an independent and relatively 
unbiased estimate of the current completeness of publicly available EST 
databases. 

"py AMPLE 2 

» 

This example demonstrates a comparison of the expression pattern of 
normal colon epithelium and primary colon cancers. 

Comparison of expression patterns between normal colon epithelium 
and primary colon cancers revealed that the majority of transcripts were 
expressed at similar levels (Fig. 1A). However, the expression profiles also 
revealed 289 transcripts that were expressed at significantly different levels [P 
< 0.01, (8)]. Of these 289, 181 were decreased in colon tumors compared to 
normal colon (average decrease 10-fold; Fig. IB; examples in Fig. 2A). 
Conversely, 108 transcripts were expressed at higher levels in the colon 
cancers than in normal colon (average increase 13-fold; Fig. 1C; examples in 
Fig. 2A). Monte Carlo simulations indicated that the analysis would have 
detected over 95% of those transcripts expressed at a 6-fold or greater level 
in normal vs. tumor cells or vice versa (9). Because relatively stringent criteria 
were used for defining differences [P < 0.01, (*)], the number of differences 
reported above is likely to be an underestimate. 
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EXAMPTJ? 3 

This example demonstrates the similarities and differences between 
cancer cell line transcription and transcription of primary cancer tissues. 
To determine how many of the 289 differences were independent of the cellular 
microenvironment of cancers in vivo, SAGE data from CR cancer cell lines was 
compared to that from primary CR cancer tissues (Fig. IB, 1C). Perhaps 
surprisingly, the majority of transcripts (130 of 181) that were expressed at 
reduced levels in cancer cells in vivo were also expressed at significantly lower 
levels in the cell lines (Fig. IB). Likewise, a significant fraction of the 
transcripts expressed at increased levels in primary cancers were also expressed 
at higher levels in the CR cancer cell lines (Fig. 1C). Thus, many of the gene 
expression differences that distinguish normal from tumor cells in vivo persist 
during in vitro growth. However, despite these similarities there were also 
many differences. For example, only 47 of 228 genes expressed at higher 
levels in CR cancer cell lines were also expressed at high levels in the primary 
CR cancers. 

In combination, comparing the expression pattern of CR cancer cells 
(in vivo or in vitro) to normal colon revealed 548 differentially expressed 
transcripts (Fig. 1B,C, Tables 2 and 3). The average difference in expression 
for these transcripts was 15 fold. Although the ability to detect differences is 
influenced by the magnitude of the variance with the power to detect smaller 
differences being less, 92 transcripts that were less than three fold different 
were identified among the 548 transcripts. However, those genes exhibiting 
the greatest differences in expression are likely to be the most biologically 
important. 
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FX AMPLE 4 

This example demonstrates the similarities and differences between 
colorectal cancer transcription and pancreatic cancer transcription. 
To determine whether the changes noted in CR cancers were neoplasia or cell 
type specific, we performed SAC® on mRNA derived from pancreatic cancers. 
A total of 404 transcripts were expressed at higher levels in pancreatic cancers 
compared to normal colon epithelium (examples in Fig. 2B). The majority 
(268) of these transcripts were pancreas-specific (70) (Example in Fig. 2C) 
although 136 were also expressed at high levels in CR cancers. These 136 
transcripts constituted 47% of the 289 transcripts increased in CR cancers 
relative to normal colon and are likely to be related to the neoplastic process 
rather than to the specific cell type of origin. 

EXAMPLE 5 

This example demonstrates the reproducibility of the transcription 
patterns observed among a larger number of cancer samples. 

One question that arose from these data is the potential heterogeneity 
of expression between individual tumors. The SAGE data were acquired from 
two examples of each tissue type (normal colon, primary CR cancer, CR cancer 
cell line, etc.). To examine the generality of these expression profiles, we 
arbitrarily selected 27 differentially expressed transcripts and evaluated them 
in six to twelve samples of normal colon and primary cancers by Northern blot 
analysis (77). In general, expression patterns were very reproducible among 
different samples. Of 10 genes with elevated expression in normal colon 
relative to CR cancers as determined by SAGE, each was detected in the 
normal colon samples and was expressed at considerably lower levels in tumors 
(examples in Fig. 2A). Sinularly, most of the genes identified by SAGE as 
increased in CR or pancreatic cancers were confirmed to be reproducibhy 
expressed in the majority of primary cancers examined by Northern blot 
(examples in Fig. 2A). It is important to note, however, that there were 
differences among the cancers, with a few cancers exhibiting particularly high 
or low levels of individual transcripts. Such differences in gene expression 
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undoubtedly contribute to the observed heterogeneity in biological properties 
of cancers derived from the same organ . 

EXAMPLES 

This example demonstrates the identities of some of the transcripts 
which were found to be differentially expressed in tumor and normal tissues. 
What are the identities of the differentially expressed genes? Of the 548 
differentially expressed transcripts, 317 were tentatively identified through 
database comparisons. When tested, the great majority (93%) of these 
identifications proved to be legitimate as expected from previous SAGE 
analyses . Although a large number of differentially expressed genes were 
identified, some simple patterns did emerge. For example, genes that were 
expressed at higher levels in normal colon epithelium than in CR tumors were 
often differentiation-related. These genes included liver fatty acid binding 
protein , cytokeratin 20 , carbonic anhydrase , guanylin and uroguanylin , 
which are known to be important for the normal physiology or architecture of 
the colon epithelium (Table 2). On the other hand, genes that were increased 
in CR cancers were often related to the robust growth characteristics that these 
cells exhibit. For example, gene products associated with protein synthesis, 
including 48 ribosomal proteins, five elongation factors, and five genes 
involved in glycolysis were observed to be elevated in both CR and pancreatic 
cancers compared to normal colon cells. Although the majority of the 
transcripts could not have been predicted to be differentially expressed in 
cancers, several have previously been shown to be dysregulated in neoplastic 
cells. The latter included IGFII , B23 nucleophosmin, the Pi form of 
glutathione S-transferase, and several ribosomal proteins which were all 
increased in cancer cells as previously reported. Likewise, Dra andgelsolin 
were both decreased in cancer as previously reported. Surprisingly, two widely 
studied oncogenes, c-fos and c-erbb3 y were expressed at much higher levels in 
normal colon epithelium than CR cancers, in contrast to their up-regulation in 
transformed cells . 
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In summary, these data provide basic information necessary for 
understanding the gene expression differences that underlie cancer phenotypes. 
They additionally provide a necessary framework for interpreting the 
significance of individual differentially expressed genes. Although this study 

5 demonstrated that a large number of such differences exist (approximately 500 

at the depth of analysis employed), it was equally remarkable that the fraction 
of transcripts exhibiting significant differences was relatively small, 
representing 1.5 % of the transcripts detected in any given cell type {26). The 
fact that many, but not all, of the differences were preserved during in vitro 

10 culture demonstrates the utility of cultured lines for examination of some 

aspects of gene expression, but also provides a note of caution in relying on 
such lines to perfectly mimic tumors in their natural environment. Finally, the 
finding that hundreds of specific genes are expressed at different levels in CR 
cancers, and that some of these are also expressed differentially in pancreatic 

15 cancers, provides a wealth of new reagents for future biologic and diagnostic 

experimentation. 
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performed as previously described (2). SAGE data was analyzed by means of 
SAGE software and GenBank Release 95 as previously described (2). 

4. A total of 69,393 different SAGE tags were identified among 
the 303,706 tags analyzed. A small fraction of these different tags were likely 
due to sequencing errors. SAGE analysis of yeast (2), wherein the entire 
genomic sequence is known, demonstrated a sequencing error rate of- 0.7%, 
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5 . Abundances can be simply determined by dividing the observed 
number of tags for a given transcript by the total number of tags obtained. An 
estimate of approximately 300,000 transcripts per cell was used to convert the 
abundances to copies per cell [N. D. Hastie, J. O. Bishop, Cell 9, 761 (1976)]. 
5 6. J. O. Bishop, J. G. Morton, M Rosbash, M. Richardson, Nature 

250, 199 (1974); B. Lewin, Gene Expression Vol 2 (John Wiley and sons, 
New York 1980). 

7. Computer simulations indicated that analysis of 300,000 tags 
would yield a 92 % chance of detecting a tag for a transcript whose expression 
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8. To minimize the number of assumptions and to account for the 
large number of comparisons being made, Monte Carlo analysis was used for 
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relative likelihood due to chance alone ("p-chance") of obtaining a difference 
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hypothesis. This likelihood was converted to an absolute probability value by 
20 simulating 40 experiments in which a representative number of transcripts 

(27,993 transcripts in each experiment) was identified and compared. The 
distribution of transcripts used for these simulations was derived from the 
average level of expression observed in the original samples. The distribution 
of the p-chance scores obtained in the 40 simulated experiments (false 
25 positives) was then compared to those obtained experimentally. Based on this 

comparison, a maximum value of 0.0005 was chosen for p-chance. This 
yielded a false positive rate that was no higher than 0.01 for the least 
significant p-chance value below the cutoff. 

9. Two hundred simulations assuming an abundance of 0.0001 in 
30 one sample and 0.0006 in a second sample revealed a significant difference (P 

< 0.01, [8]) 95% of the time. 
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26. In the case of normal and neoplastic colon cancer tissue, 548 
differentially transcripts were identified among the 36,125 unique transcripts. 

27. All references cited are hereby incorporated by reference herein. 
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15 form SEQ ID NOS: 1-732. 
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CLAIMS 

1. A method of diagnosing colon cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 
group consisting of those shown in Table 3; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to belower in the first sample than in the second 
sample. 

2. A method of diagnosing colon cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 
group consisting of those shown in Table 2; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

3. The method of claim 1 wherein a comparison of at least two of said 
transcripts is performed. 

4. The method of claim 2 wherein a comparison of at least two of said 
transcripts is performed. 
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5. The method of claim 1 wherein a comparison of at least five of said 
transcripts is performed. 

6. The method of claim 2 wherein a comparison of at least five of said 
transcripts is performed. 

7. The method of claim 1 wherein a comparison of at least ten of said 
transcripts is performed. 

8. The method of claim 2 wherein a comparison of at least ten of said 
transcripts is performed. 

9. The method of claim 1 wherein a comparison of at least twenty of said 
transcripts is performed. 

10. The method of claim 2 wherein a comparison of at least twenty of said 
transcripts is performed. 

1 1 . The method of claim 1 wherein a comparison of at least thirty of said 
transcripts is performed. 

12. The method of claim 2 wherein a comparison of at least thirty of said 
transcripts is performed. 

13. An isolated and purified human nucleic acid molecule which comprises 
a SAGE tag selected from SEQ ID NO: 1-732. 



14. 



The nucleic acid molecule of claim 13 which is a cDNA molecule. 
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1 5. The nucleic acid molecule of claim 1 3 wherein the SAGE tag is located 
at the 3' end of the molecule, adjacent to the 3'-most NlalH restriction enzyme 
she. 



16. An isolated nucleotide probe comprising at least 10 nucleotides of a 
human nucleic acid molecule, wherein the human nucleic acid molecule 
comprises a SAGE tag selected from SEQ ID NO: 1-732. 

1 7. The probe of claim 1 6 which comprises the selected SAGE tag. 

18. A diagnostic reagent for evaluating neoplasia of a colorectal tissue, 
comprising at least 2 probes according to claim 16. 

1 9. The diagnostic reagent of claim 1 8 which comprises at least 5 probes 
according to claim 16. 

20. The diagnostic reagent of claim 1 8 which comprises at least 10 probes 
according to claim 16. 



21. The diagnostic reagent of claim 18 which comprises at least 20 probes 
according to claim 16. 

22. The diagnostic reagent of claim 1 8 which comprises at least 30 probes 
according to claim 16. 

23. A diagnostic reagent for evaluating neoplasia of a colorectal tissue, 
comprising at least 2 probes according to claim 17. 

24. A method of diagnosing pancreatic cancer in a sample suspected of 
being neoplastic, comprising the steps of: 
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comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a pancreatic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colon tissue, wherein said transcript is identified by a tag selected from the 
group consisting of those shown Table 4; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

25. A method of diagnosing cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a tissue suspected of 
being neoplastic and the second sample is of a normal human tissue of the 
same tissue type, wherein said transcript is identified by a tag selected from the 
group consisting of those shown Table 5; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

26. A method to aid in the determination of a prognosis for a colon cancer 

patient, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic colonic 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a teg selected from the group consisting of those 

shown in Table 3 ; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be lower in the first sample than in the second sample. 



27. A method to aid in determining a prognosis for a patient with 
cancer, comprising the steps of: 
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comparing the level of at least one transcript in a first tissue 
sample to a second sample, wherein the first sample is of a colonic cancer 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a tag selected from the group consisting of those 
5 shown in Table 2; 

detennining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

28. A method of diagnosing colon cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

*0 comparing the level of expression of at least one protein in a 

first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
identified by a tag selected from the group consisting of those shown in Table 

15 3; 

identifying the first sample as neoplastic when the level of 
expression of the protein is found to be lower in the first sample than in the 
second sample. 

29. A method of diagnosing colon cancer in a sample suspected of being 
20 neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
25 identified by a tag selected from the group consisting of those shown in Table 

2; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 
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30. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic 
pancreatic tissue and the second sample is of a normal human colon tissue, 
wherein said transcript is identified by a tag selected from the group consisting 

of those shown Table 4; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

31. A method to aid in providing a prognosis for a cancer patient, 

comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic tissue 
and the second sample is of a normal human tissue of the same tissue type, 
15 wherein said transcript is identified by a tag selected from the group consisting 

of those shown Table 5; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

32. A method of diagnosing pancreatic cancer in a sample suspected of 

20 being neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein 
encoded by a transcript in a first sample of a tissue to a second sample, wherein 
the first sample is of a pancreatic tissue suspected of being neoplastic and the 
second sample is of a normal human colon tissue, wherein said protein is 

25 encoded by a transcript identified by a tag selected from the group consisting 

of those shown Table 4; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 
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33. A method of diagnosing cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
tissue suspected of being neoplastic and the second sample is of a normal 
human tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 5; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

34. A method to aid in the determination of a prognosis for a colon cancer 
patient, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic colonic tissue and the second sample is of a normal human colonic 
tissue, and wherein the protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown in Table 3; 

determining a poorer prognosis if the level of expression is 
found to be lower in the first sample than in the second sample. 

35. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first tissue sample to a second sample, wherein the first sample is of a colonic 
cancer tissue and the second sample is of a normal human colonic tissue, and 
wherein the protein is encoded by a transcript identified by a tag selected from 
the group consisting of those shown in Table 2; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 
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36. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
5 neoplastic pancreatic tissue and the second sample is of a normal human colon 

tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 4; 

detennining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 

10 37. A method to aid in providing a prognosis for a cancer patient, 

comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic tissue and the second sample is of a normal human tissue of the same 
15 tissue type, wherein said protein is encoded by a transcript identified by a tag 

selected from the group consisting of those shown Table 5; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 
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38. A method of treating a cancer cell, comprising the step of: 

administering to a cancer cell an antibody which specifically 
binds to a protein encoded by a transcript identified by a tag selected from the 
group consisting of those shown in Tables 2, 4, and 5, wherein the antibody is 
linked to a cytotoxic agent. 

39. An antibody linked to a cytotoxic agent, wherein the antibody 
specifically binds to a protein encoded by a transcript identified by a tag 
selected from the group consisting of those shown in Tables 2, 4, and 5. 
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40. A method of detecting colon cancer in a patient, comprising the steps 
of: 

comparing the level of at least one protein in a first body sample 
to a second body sample, wherein the first sample is a body sample of the 
patient and the second sample is of a normal human, wherein the protein is 
encoded by a transcript identified by a tag selected from the group consisting 
of those shown in Table 2, wherein the first and second body sample is a 
sample selected from the group consisting of blood, urine, feces, sputum, and 
serum; 

identifying neoplasia when the level of the at least one protein 
is found to be higher in the first sample than in the second sample. 

41. A method of detecting pancreatic cancer in a patient, comprising the 
steps of: 

comparing the level of at least one protein encoded by a 
transcript in a first sample of a tissue to a second sample, wherein the first 
sample is of the patient and the second sample is of a normal human, wherein 
said protein is encoded by a transcript identified by a tag selected from the 
group consisting of those shown Table 4, wherein the first and second sample 
is a sample selected from the group consisting of blood, urine, feces, sputum, 
and serum; 

identifying neoplasia when the level of the at least one protein 
is found to be higher in the first sample than in the second sample. 

42. A method of detecting cancer in a patient, comprising the steps of 

comparing the level of at least one protein in a first sample to 
a second sample, wherein the first sample is of patient and the second sample 
is of a normal human, wherein said protein is encoded by a transcript identified 
by a tag selected from the group consisting of those shown Table 5, wherein 
the first and second body sample is a sample selected from the group consisting 
of blood, urine, feces, sputum, and serum; 
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identifying neoplasia when the level of the at least one protein 
is found to be higher in the first sample than in the second sample. 
43. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 

comparing the level of at least one protein in a first sample to 
a second sample, wherein the first sample is of a colonic cancer patient and the 
second sample is of a normal human, wherein the protein is encoded by a 
transcript identified by a tag selected from the group consisting of those shown 
in Table 2, wherein the first and second sample is a sample selected from the 
group consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein is found to be higher in the first sample than in the second sample. 



44. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of at least one protein in a first sample to 
a second sample, wherein the first sample is of a pancreatic cancer patient and 
the second sample is of a normal human, wherein said protein is encoded by a 
transcript identified by a tag selected from the group consisting of those shown 
Table 4, wherein said first and second sample is a sample selected from the 
group consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein is found to be higher in the first sample than in the second sample. 

45. A method to aid in providing a prognosis for a cancer patient, 

comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample to a second sample, wherein the first sample is of a cancer patient 
and the second sample is of a normal human, wherein said protein is encoded 
by a transcript identified by a tag selected from the group consisting of those 
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shown Table 5, wherein the first and second sample is a sample selected from 
the group consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein is found to be higher in the first sample than in the second sample. 

46. A method of detecting colon cancer in a patient, comprising the steps 
of: 

comparing the level of at least one transcript in a first body 
sample to a second body sample, wherein the first sample is a body sample of 
the patient and the second sample is of a normal human, wherein the transcript 
is identified by a tag selected from the group consisting of those shown in 
Table 2, wherein the first and second body sample is a sample selected from the 
group consisting of blood, urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one transcript 
is found to be higher in the first sample than in the second sample. 

47. A method of detecting pancreatic cancer in a patient, comprising the 
steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of the patient and the 
second sample is of a normal human, wherein said transcript is identified by a 
tag selected from the group consisting of those shown Table 4, wherein the 
first and second sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one transcript 
is found to be higher in the first sample than in the second sample. 

48. A method of detecting cancer in a patient, comprising the steps of: 

comparing the level of at least one transcript in a first sample to 
a second sample, wherein the first sample is of patient and the second sample 
is of a normal human, wherein said transcript is identified by a tag selected 
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from the group consisting of those shown Table 5, wherein the first and second 
body sample is a sample selected from the group consisting of blood, urine, 
feces, sputum, and serum; 

identifying neoplasia when the level of the at least one transcript 
is found to be higher in the first sample than in the second sample. 
49. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 

comparing the level of at least one transcript in a first sample to 
a second sample, wherein the first sample is of a colonic cancer patient and the 
second sample is of a normal human, wherein the transcript is identified by a 
tag selected from the group consisting of those shown in Table 2, wherein the 
first and second sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 



50. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of at least one transcript in a first sample to 
a second sample, wherein the first sample is of a pancreatic cancer patient and 
the second sample is of a normal human, wherein said transcript is identified by 
a tag selected from the group consisting of those shown Table 4, wherein said 
first and second sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 



51. A method to aid in providing a prognosis for a cancer patient, 
comprising the steps of: 
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comparing the level of expression of at least one transcript in 
a first sample to a second sample, wherein the first sample is of a cancer patient 
and the second sample is of a normal human, wherein said transcript is 
identified by a tag selected from the group consisting of those shown Table 5, 
5 wherein the first and second sample is a sample selected from the group 

consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

52. A method for screening for candidate agents that modulate the 
10 expression of a polynuleotide selected from the group consisting of the 

polynucleotides in SEQ ID NOS: 1-732 or their respective complements, 
comprising contacting a test agent with a colon or pancreatic cell and 
monitoring expression of the polynucleotide, wherein the test agent which 
modifies the expression of the polynucleotide is a candidate agent. 
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